Revisiting random walk based sampling in networks: evasion of burn-in period and frequent regenerations
نویسندگان
چکیده
Background In the framework of network sampling, random walk (RW) based estimation techniques provide many pragmatic solutions while uncovering the unknown network as little as possible. Despite several theoretical advances in this area, RW based sampling techniques usually make a strong assumption that the samples are in stationary regime, and hence are impelled to leave out the samples collected during the burn-in period. Methods This work proposes two sampling schemes without burn-in time constraint to estimate the average of an arbitrary function defined on the network nodes, for example, the average age of users in a social network. The central idea of the algorithms lies in exploiting regeneration of RWs at revisits to an aggregated super-node or to a set of nodes, and in strategies to enhance the frequency of such regenerations either by contracting the graph or by making the hitting set larger. Our first algorithm, which is based on reinforcement learning (RL), uses stochastic approximation to derive an estimator. This method can be seen as intermediate between purely stochastic Markov chain Monte Carlo iterations and deterministic relative value iterations. The second algorithm, which we call the Ratio with Tours (RT)-estimator, is a modified form of respondent-driven sampling (RDS) that accommodates the idea of regeneration. Results We study the methods via simulations on real networks. We observe that the trajectories of RL-estimator are much more stable than those of standard random walk based estimation procedures, and its error performance is comparable to that of respondent-driven sampling (RDS) which has a smaller asymptotic variance than many other estimators. Simulation studies also show that the mean squared error of RT-estimator decays much faster than that of RDS with time. Conclusion The newly developed RW based estimators (RL- and RT-estimators) allow to avoid burn-in period, provide better control of stability along the sample path, and overall reduce the estimation time. Our estimators can be applied in social and complex networks.
منابع مشابه
Leveraging History for Faster Sampling of Online Social Networks
With a vast amount of data available on online social networks, how to enable efficient analytics has been an increasingly important research problem. Many existing studies resort to sampling techniques that draw random nodes from an online social network through its restrictive web/API interface. While almost all of these techniques use the exact same underlying technique of random walk a Mark...
متن کاملThe Comparison of Multi-variable Linear Regression and Artificial Neutral Networks in Tax Evasion of Legal Persons in Iranian Tax System
Tax evasion is one of the most important problems of tax system in the most countries around the world. It covers any unlawful attempt to avoid paying taxes. In present study, the affective factors on tax evasion based on experts’ views were extracted by using Delphi method, so we identified 29 factors and finally 16 factors were extracted based on measurement ability among them. The statistica...
متن کاملSampling from social networks’s graph based on topological properties and bee colony algorithm
In recent years, the sampling problem in massive graphs of social networks has attracted much attention for fast analyzing a small and good sample instead of a huge network. Many algorithms have been proposed for sampling of social network’ graph. The purpose of these algorithms is to create a sample that is approximately similar to the original network’s graph in terms of properties such as de...
متن کاملRevisiting network scanning detection using sequential hypothesis testing
Network scanning is a common, effective technique to search for vulnerable Internet hosts and to explore the topology and trust relationships between hosts in a target network. Given that the purpose of scanning is searching for responsive hosts and network services, behaviour-based scanning detection techniques based on the state of inbound connection attempts remain effective against evasion....
متن کاملTracking performance of incremental LMS algorithm over adaptive distributed sensor networks
in this paper we focus on the tracking performance of incremental adaptive LMS algorithm in an adaptive network. For this reason we consider the unknown weight vector to be a time varying sequence. First we analyze the performance of network in tracking a time varying weight vector and then we explain the estimation of Rayleigh fading channel through a random walk model. Closed form relations a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 5 شماره
صفحات -
تاریخ انتشار 2018